Loading Libraries
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Problem 1
Instacart Dataset Descriptions
- The Instacart dataset includes 15 variables: order_id, product_id, add_to_cart_order, reordered, user_id, eval_set, order_number, order_dow, order_hour_of_day, days_since_prior_order, product_name, aisle_id, department_id, aisle, department.
- The dataset has 1384617 rows and 15 columns.
- The range of days since the prior order is (0, 30).
- The median number of items added to cart order is (7).
- The total number of times an item was reordered is 828824.
Aisles: Count and Most Ordered
## # A tibble: 1 × 1
## distinct_aisles
## <int>
## 1 134
- There are 134 different aisles.
Creating a plot
* The aisles where the most items are ordered from are 83 for fresh vegetables, 24 for fresh fruits, and 123 for packages vegetables fruits.
Instacart: Cleaning and filter
Instacart: Table
| baking ingredients |
Cane Sugar |
336 |
3 |
| baking ingredients |
Light Brown Sugar |
499 |
1 |
| baking ingredients |
Pure Baking Soda |
387 |
2 |
| dog food care |
Organix Chicken & Brown Rice Recipe |
28 |
2 |
| dog food care |
Small Dog Biscuits |
26 |
3 |
| dog food care |
Snack Sticks Chicken & Rice Recipe Dog Treats |
30 |
1 |
Icecream: Cleaning and filtering
## `summarise()` has grouped output by 'product_name'. You can override using the `.groups` argument.
Icecream: Table
| Coffee Ice Cream |
13.77419 |
14.31579 |
15.38095 |
15.31818 |
15.21739 |
12.26316 |
13.83333 |
| Pink Lady Apples |
13.44118 |
11.36000 |
11.70213 |
14.25000 |
11.55172 |
12.78431 |
11.93750 |
Problem 2
2002 states with 7 or more locations
2010 states with 7 or more locations
String plot of excellent responses
## Warning: Removed 30 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).

Distribution Plot
## Warning in year == c(2006, 2010): longer object length is not a multiple of
## shorter object length

Problem 3
## Rows: 35 Columns: 1443
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): day
## dbl (1442): week, day_id, activity.1, activity.2, activity.3, activity.4, ac...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Accelerometer Dataset Cleaning
Accelerometer Dataset Description
- The Accelerometer dataset includes variables week_day, week, day_id, day, and activity.
- The dataset has 35 rows and 1443 columns.
- The range of activity on day 1 is (, -).
- The total amount of activity on day 1 is 0.
Aggregating Accelerometer Dataset
| 1 |
480542.62 |
| 2 |
78828.07 |
| 3 |
376254.00 |
| 4 |
631105.00 |
| 5 |
355923.64 |
| 6 |
307094.24 |